Consolidated Tree Construction Algorithm: Structurally Steady Trees

نویسندگان

  • Jesús M. Pérez
  • Javier Muguerza
  • Olatz Arbelaitz
  • Ibai Gurrutxaga
چکیده

This paper presents a new methodology for building decision trees or classification trees (Consolidated Trees Construction algorithm) that faces up the problem of unsteadiness appearing in the paradigm when small variations in the training set happen. As a consequence, the understanding of the made classification is not lost, making this technique different from techniques such as bagging and boosting where the explanatory feature of the classification disappears. The presented methodology consists on a new metaalgorithm for building structurally more steady and less complex trees (consolidated trees), so that they maintain the explaining capacity and they are faster, but, without losing the discriminating capacity. The meta-algorithm uses C4.5 as base classifier. Besides the meta-algorithm, we propose a measure of the structural diversity used to analyse the stability of the structural component. This measure gives an estimation of the heterogeneity in a set of trees from the structural point of view. The obtained results have been compared with the ones get with C4.5 in some UCI Repository databases and a real application of customer fidelisation from a company of electrical appliances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consolidated Trees: An Analysis of Structural Convergence

When different subsamples of the same data set are used to induce classification trees, the structure of the built classifiers is very different. The stability of the structure of the tree is of capital importance in many domains, such as illness diagnosis, fraud detection in different fields, customer’s behaviour analysis (marketing), etc, where comprehensibility of the classifier is necessary...

متن کامل

Behavior of Consolidated Trees when using Resampling Techniques

Many machine learning areas use subsampling techniques with different objectives: reducing the size of the training set, equilibrate the class imbalance or non-uniform cost error, etc. Subsampling affects severely to the behavior of classification algorithms. Decision trees induced from different subsamples of the same data set are very different in accuracy and structure. This affects the expl...

متن کامل

A new algorithm to build consolidated trees: study of the error rate and steadiness

This paper presents a new methodology for building decision trees, Consolidated Trees Construction algorithm, that improves the behavior of C4.5. It reduces the error and the complexity of the induced trees, being the differences in the complexity statistically significant. The advantage of this methodology in respect to other techniques such as bagging, boosting, etc. is that the final classif...

متن کامل

The Effect of the Used Resampling Technique and Number of Samples in Consolidated Trees’ Construction Algorithm

In many pattern recognition problems, the explanation of the made classification becomes as important as the good performance of the classifier related to its discriminating capacity. For this kind of problems we can use Consolidated Trees ́ Construction (CTC) algorithm which uses several subsamples to build a single tree. This paper presents a wide analysis of the behavior of CTC algorithm for ...

متن کامل

CTC: An Alternative to Extract Explanation from Bagging

Being aware of the importance of classifiers to be comprehensible when using machine learning to solve real world problems, bagging needs a way to be explained. This work compares Consolidated Tree’s Construction (CTC) algorithm with the Combined Multiple Models (CMM) method proposed by Domingos when used to extract explanation of the classification made by bagging. The comparison has been done...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004